
DeepSeek ยท Chat / LLM ยท 70B Parameters ยท 64K Context

Streaming Reasoning Chain-of-Thought Code Long Context ChatOverview
DeepSeek R1 Distill LLaMA 70B is a knowledge-distilled reasoning model built on the LLaMA-3.1-70B architecture, trained on high-quality reasoning outputs from DeepSeek R1. It delivers near frontier-level analytical performance while running on significantly smaller hardware than the full R1 model โ making it ideal for teams that need powerful chain-of-thought reasoning without the infrastructure overhead of a 671B parameter system. Served instantly via the Qubrid AI Serverless API.๐ง Frontier reasoning. Distilled efficiency. Run DeepSeek R1 intelligence on Qubrid AI โ no GPUs, no setup, no ops.
Model Specifications
| Field | Details |
|---|---|
| Model ID | deepseek-ai/deepseek-r1-distill-llama-70b |
| Provider | DeepSeek |
| Kind | Chat / LLM |
| Architecture | LLaMA-3.1-70B (Distilled) |
| Parameters | 70B |
| Context Length | 64,000 Tokens |
| MoE | No |
| Release Date | January 2025 |
| License | DeepSeek R1 License (MIT) |
| Training Data | Distilled from DeepSeek R1 high-quality reasoning outputs with Llama 70B |
| Function Calling | Not Supported |
| Image Support | N/A |
| Serverless API | Available |
| Fine-tuning | Coming Soon |
| On-demand | Coming Soon |
| State | ๐ข Ready |
Pricing
๐ณ Access via the Qubrid AI Serverless API with pay-per-token pricing. No infrastructure management required.
| Token Type | Price per 1M Tokens |
|---|---|
| Input Tokens | $1.20 |
| Output Tokens | $1.80 |
Quickstart
Prerequisites
- Create a free account at platform.qubrid.com
- Generate your API key from the API Keys section
- Replace
QUBRID_API_KEYin the code below with your actual key
Python
JavaScript
Go
cURL
Live Example
Prompt: Explain quantum computing in simple terms
Response:
Playground Features
The Qubrid AI Playground lets you interact with DeepSeek R1 Distill LLaMA 70B directly in your browser โ no setup, no code, no cost to explore.๐ง System Prompt
Shape the modelโs reasoning approach, output format, and domain focus before the conversation begins โ ideal for technical assistants, structured analysis pipelines, and multi-turn problem-solving workflows.Set your system prompt once in the Qubrid Playground and it applies across every turn of the conversation.
๐ฏ Few-Shot Examples
Guide the modelโs reasoning depth and output structure with concrete examples โ no fine-tuning, no retraining required.| User Input | Assistant Response |
|---|---|
What is the time complexity of merge sort? | Merge sort has O(n log n) time complexity in all cases โ best, average, and worst. This is because the array is divided log n times and each division requires O(n) work to merge. |
Solve: if 3x + 7 = 22, what is x? | Step 1: Subtract 7 from both sides โ 3x = 15. Step 2: Divide by 3 โ x = 5. |
๐ก Stack multiple few-shot examples in the Qubrid Playground to shape reasoning style and output format โ no fine-tuning required.
Inference Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| Streaming | boolean | true | Enable streaming responses for real-time output |
| Temperature | number | 0.3 | Controls creativity and randomness. Higher values produce more diverse output |
| Max Tokens | number | 10000 | Defines the maximum number of tokens the model is allowed to generate |
| Top P | number | 1 | Nucleus sampling: limits token selection to a subset of top probability mass |
| Reasoning Effort | select | medium | Adjusts the depth of reasoning and problem-solving effort. Higher settings yield more thorough responses at the cost of latency |
| Reasoning Summary | select | auto | Controls verbosity of reasoning explanations. auto lets the model decide; concise gives brief summaries; detailed offers in-depth explanations |
Use Cases
- Advanced reasoning and problem solving
- Conversational AI
- Technical and coding assistance
- Long-form text generation
- Math and logic tasks
- Research and analysis
Strengths & Limitations
| Strengths | Limitations |
|---|---|
| Excellent reasoning and chain-of-thought capability | Slightly slower than smaller distilled models |
| Lower GPU memory requirement compared to the full R1 model | Reasoning quality may vary in very complex tasks |
| Strong performance across technical and multilingual tasks | Function calling not supported |
| Open-source and suitable for on-prem deployment |
Why Qubrid AI?
- ๐ No infrastructure setup โ serverless API, pay only for what you use
- ๐ OpenAI-compatible โ drop-in replacement using the same SDK, just swap the base URL
- ๐ง Reasoning at scale โ distilled R1 intelligence served with Qubridโs low-latency infrastructure
- ๐งช Built-in Playground โ prototype with system prompts and few-shot examples instantly at platform.qubrid.com
- ๐ Full observability โ API logs and usage tracking built into the Qubrid dashboard
- ๐ Multi-language support โ Python, JavaScript, Go, cURL out of the box
Resources
| Resource | Link |
|---|---|
| ๐ Qubrid Docs | docs.platform.qubrid.com |
| ๐ฎ Playground | Try DeepSeek R1 Distill LLaMA 70B live |
| ๐ API Keys | Get your API Key |
| ๐ค Hugging Face | deepseek-ai/deepseek-r1-distill-llama-70b |
| ๐ฌ Discord | Join the Qubrid Community |
Built with โค๏ธ by Qubrid AI
Frontier models. Serverless infrastructure. Zero friction.
Frontier models. Serverless infrastructure. Zero friction.